In light of unprecedented increases in the popularity of the internet and social media, comment moderation has never been a more relevant task. Semi-automated comment moderation systems greatly aid human moderators by either automatically classifying the examples or allowing the moderators to prioritize which comments to consider first. However, the concept of inappropriate content is often subjective, and such content can be conveyed in many subtle and indirect ways. In this work, we propose CoRAL -- a language and culturally aware Croatian Abusive dataset covering phenomena of implicitness and reliance on local and global context. We show experimentally that current models degrade when comments are not explicit and further degrade when language skill and context knowledge are required to interpret the comment.
translated by 谷歌翻译
Large language models (LLMs) have demonstrated impressive capabilities in natural language understanding and generation, but the quality bar for medical and clinical applications is high. Today, attempts to assess models' clinical knowledge typically rely on automated evaluations on limited benchmarks. There is no standard to evaluate model predictions and reasoning across a breadth of tasks. To address this, we present MultiMedQA, a benchmark combining six existing open question answering datasets spanning professional medical exams, research, and consumer queries; and HealthSearchQA, a new free-response dataset of medical questions searched online. We propose a framework for human evaluation of model answers along multiple axes including factuality, precision, possible harm, and bias. In addition, we evaluate PaLM (a 540-billion parameter LLM) and its instruction-tuned variant, Flan-PaLM, on MultiMedQA. Using a combination of prompting strategies, Flan-PaLM achieves state-of-the-art accuracy on every MultiMedQA multiple-choice dataset (MedQA, MedMCQA, PubMedQA, MMLU clinical topics), including 67.6% accuracy on MedQA (US Medical License Exam questions), surpassing prior state-of-the-art by over 17%. However, human evaluation reveals key gaps in Flan-PaLM responses. To resolve this we introduce instruction prompt tuning, a parameter-efficient approach for aligning LLMs to new domains using a few exemplars. The resulting model, Med-PaLM, performs encouragingly, but remains inferior to clinicians. We show that comprehension, recall of knowledge, and medical reasoning improve with model scale and instruction prompt tuning, suggesting the potential utility of LLMs in medicine. Our human evaluations reveal important limitations of today's models, reinforcing the importance of both evaluation frameworks and method development in creating safe, helpful LLM models for clinical applications.
translated by 谷歌翻译
We study the sample complexity of reducing reinforcement learning to a sequence of empirical risk minimization problems over the policy space. Such reductions-based algorithms exhibit local convergence in the function space, as opposed to the parameter space for policy gradient algorithms, and thus are unaffected by the possibly non-linear or discontinuous parameterization of the policy class. We propose a variance-reduced variant of Conservative Policy Iteration that improves the sample complexity of producing a $\varepsilon$-functional local optimum from $O(\varepsilon^{-4})$ to $O(\varepsilon^{-3})$. Under state-coverage and policy-completeness assumptions, the algorithm enjoys $\varepsilon$-global optimality after sampling $O(\varepsilon^{-2})$ times, improving upon the previously established $O(\varepsilon^{-3})$ sample requirement.
translated by 谷歌翻译
This paper proposes a novel controller framework that provides trajectory tracking for an Aerial Manipulator (AM) while ensuring the safe operation of the system under unknown bounded disturbances. The AM considered here is a 2-DOF (degrees-of-freedom) manipulator rigidly attached to a UAV. Our proposed controller structure follows the conventional inner loop PID control for attitude dynamics and an outer loop controller for tracking a reference trajectory. The outer loop control is based on the Model Predictive Control (MPC) with constraints derived using the Barrier Lyapunov Function (BLF) for the safe operation of the AM. BLF-based constraints are proposed for two objectives, viz. 1) To avoid the AM from colliding with static obstacles like a rectangular wall, and 2) To maintain the end effector of the manipulator within the desired workspace. The proposed BLF ensures that the above-mentioned objectives are satisfied even in the presence of unknown bounded disturbances. The capabilities of the proposed controller are demonstrated through high-fidelity non-linear simulations with parameters derived from a real laboratory scale AM. We compare the performance of our controller with other state-of-the-art MPC controllers for AM.
translated by 谷歌翻译
Virtual Product placement(VPP) is the advertising technique of digitally placing a branded object into the scene of a movie or TV show. This type of advertising provides the ability for brands to reach consumers without interrupting the viewing experience with a commercial break, as the products are seen in the background or as props. Despite this being a billion-dollar industry, ad rendering technique is currently executed at post production stage, manually either with the help of VFx artists or through semi-automated solutions. In this paper, we demonstrate a fully automated framework to digitally place 2-D ads in linear TV cooking shows captured using single-view camera with small camera movements. Without access to full video or production camera configuration, this framework performs the following tasks (i) identifying empty space for 2-D ad placement (ii) kitchen scene understanding (iii) occlusion handling (iv) ambient lighting and (v) ad tracking.
translated by 谷歌翻译
We propose a trust-region stochastic sequential quadratic programming algorithm (TR-StoSQP) to solve nonlinear optimization problems with stochastic objectives and deterministic equality constraints. We consider a fully stochastic setting, where in each iteration a single sample is generated to estimate the objective gradient. The algorithm adaptively selects the trust-region radius and, compared to the existing line-search StoSQP schemes, allows us to employ indefinite Hessian matrices (i.e., Hessians without modification) in SQP subproblems. As a trust-region method for constrained optimization, our algorithm needs to address an infeasibility issue -- the linearized equality constraints and trust-region constraints might lead to infeasible SQP subproblems. In this regard, we propose an \textit{adaptive relaxation technique} to compute the trial step that consists of a normal step and a tangential step. To control the lengths of the two steps, we adaptively decompose the trust-region radius into two segments based on the proportions of the feasibility and optimality residuals to the full KKT residual. The normal step has a closed form, while the tangential step is solved from a trust-region subproblem, to which a solution ensuring the Cauchy reduction is sufficient for our study. We establish the global almost sure convergence guarantee for TR-StoSQP, and illustrate its empirical performance on both a subset of problems in the CUTEst test set and constrained logistic regression problems using data from the LIBSVM collection.
translated by 谷歌翻译
The use of emojis affords a visual modality to, often private, textual communication. The task of predicting emojis however provides a challenge for machine learning as emoji use tends to cluster into the frequently used and the rarely used emojis. Much of the machine learning research on emoji use has focused on high resource languages and has conceptualised the task of predicting emojis around traditional server-side machine learning approaches. However, traditional machine learning approaches for private communication can introduce privacy concerns, as these approaches require all data to be transmitted to a central storage. In this paper, we seek to address the dual concerns of emphasising high resource languages for emoji prediction and risking the privacy of people's data. We introduce a new dataset of $118$k tweets (augmented from $25$k unique tweets) for emoji prediction in Hindi, and propose a modification to the federated learning algorithm, CausalFedGSD, which aims to strike a balance between model performance and user privacy. We show that our approach obtains comparative scores with more complex centralised models while reducing the amount of data required to optimise the models and minimising risks to user privacy.
translated by 谷歌翻译
室内运动计划的重点是解决通过混乱环境导航代理的问题。迄今为止,在该领域已经完成了很多工作,但是这些方法通常无法找到计算廉价的在线路径计划和路径最佳之间的最佳平衡。除此之外,这些作品通常证明是单一启动单目标世界的最佳性。为了应对这些挑战,我们为在未知室内环境中进行导航的多个路径路径计划者和控制器堆栈,在该环境中,路点将目标与机器人必须在达到目标之前必须穿越的中介点一起。我们的方法利用全球规划师(在任何瞬间找到下一个最佳航路点),本地规划师(计划通往特定航路点的路径)以及自适应模型预测性控制策略(用于强大的系统控制和更快的操作) 。我们在一组随机生成的障碍图,中间航路点和起始目标对上评估了算法,结果表明计算成本显着降低,具有高度准确性和可靠的控制。
translated by 谷歌翻译
至于其他形式的AI,最近已经对不同用户同伙的性能差异进行了研究。在语音识别方面实现公平性的一种方法是(1)确定遭受低标准表现的说话者队列,以及(2)采取针对发现同类的公平性缓解措施。在本文中,我们使用产品规模的AI助手语音识别系统的数据报告了发现和缓解性能差异的初步发现。我们将基于地理和人口统计学信息的队列发现与一种更可扩展的方法进行比较,该方法将使用扬声器嵌入技术分组没有人类标签的说话者。为了缓解公平性,我们发现对代表性不足的队列的过度采样,以及通过其他输入变量对扬声器队列的建模,从而减少了表现和底部性能队列之间的差距,而不会降低整体识别精度。
translated by 谷歌翻译
人工智能的最新趋势是将验证的模型用于语言和视觉任务,这些模型已经实现了非凡的表现,但也令人困惑。因此,以各种方式探索这些模型的能力对该领域至关重要。在本文中,我们探讨了模型的可靠性,在其中我们将可靠的模型定义为一个不仅可以实现强大的预测性能,而且在许多涉及不确定性(例如选择性预测,开放式设置识别)的决策任务上,在许多决策任务上表现出色,而且表现良好。强大的概括(例如,准确性和适当的评分规则,例如在分布数据集中和分发数据集上的对数可能性)和适应性(例如,主动学习,几乎没有射击不确定性)。我们设计了40个数据集的10种任务类型,以评估视觉和语言域上可靠性的不同方面。为了提高可靠性,我们分别开发了VIT-PLEX和T5-PLEX,分别针对视觉和语言方式扩展了大型模型。 PLEX极大地改善了跨可靠性任务的最先进,并简化了传统协议,因为它可以改善开箱即用的性能,并且不需要设计分数或为每个任务调整模型。我们演示了高达1B参数的模型尺寸的缩放效果,并预处理数据集大小最多4B示例。我们还展示了PLEX在具有挑战性的任务上的功能,包括零射门的开放式识别,主动学习和对话语言理解中的不确定性。
translated by 谷歌翻译